A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis

نویسندگان

  • Yijun Sun
  • Yunpeng Cai
  • Susan M. Huse
  • Rob Knight
  • William G. Farmerie
  • Xiaoyu Wang
  • Volker Mai
چکیده

Recent advances in massively parallel sequencing technology have created new opportunities to probe the hidden world of microbes. Taxonomy-independent clustering of the 16S rRNA gene is usually the first step in analyzing microbial communities. Dozens of algorithms have been developed in the last decade, but a comprehensive benchmark study is lacking. Here, we survey algorithms currently used by microbiologists, and compare seven representative methods in a large-scale benchmark study that addresses several issues of concern. A new experimental protocol was developed that allows different algorithms to be compared using the same platform, and several criteria were introduced to facilitate a quantitative evaluation of the clustering performance of each algorithm. We found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies. The benchmark study identified our recently developed ESPRIT-Tree, a fast implementation of the average linkage-based hierarchical clustering algorithm, as one of the best algorithms available in terms of computational efficiency and clustering accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...

متن کامل

Employing Nonlinear Response History Analysis of ASCE 7-16 on a Benchmark Tall Building

ASCE 7-16 has provided a comprehensive platform for the performance-based design of tall buildings. The core of the procedure is based on nonlinear response history analysis of the structure subjected to recorded or simulated ground motions. This study investigates consistency in the ASCE 7-16 requirements regarding the use of different types of ground motions. For this purpose performance of a...

متن کامل

A HYBRID MODIFIED GENETIC-NELDER MEAD SIMPLEX ALGORITHM FOR LARGE-SCALE TRUSS OPTIMIZATION

In this paper a hybrid algorithm based on exploration power of the Genetic algorithms and exploitation capability of Nelder Mead simplex is presented for global optimization of multi-variable functions. Some modifications are imposed on genetic algorithm to improve its capability and efficiency while being hybridized with Simplex method. Benchmark test examples of structural optimization with a...

متن کامل

The metabolic footprint of the airway bacterial community in cystic fibrosis

BACKGROUND Progressive, chronic bacterial infection of the airways is a leading cause of death in cystic fibrosis (CF). Culture-independent methods based on sequencing of the bacterial 16S rRNA gene describe a distinct microbial community that decreases in richness and diversity with disease progression. Understanding the functional characteristics of the microbial community may aid in identify...

متن کامل

Microbial Community Analysis Using MiSeq Sequencing and Pathway of Methane Production in Tehran WWTP: A Full-Scale Anaerobic Digester

Introduction: One of biological wastewater treatment methods that utilizes to both digesting waste activated sludge and methane production is anaerobic digestion (AD). It is believed to be most effective solution in terms of energy crisis and environmental pollution issues. Materials and Methods: In this study the sludge was digested anaerobically sampled from a full-scale WWTP, located at sou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره 13 1  شماره 

صفحات  -

تاریخ انتشار 2012